# Bit Ordering

GitHub user [@glandium] filed this very straightforward [issue #24]:

> In some cases, it is not interesting to deal with a specific endianness, and
> it would be more useful to use the native endianness whatever it is. One could
> import the right endianness depending on the target endianness, but it would
> be more convenient if there was a "NativeEndian" to avoid the manual
> gymnastic.

This question is not answerable in any meaningful way.

I am going to describe byte-level endianness, and why it exists, before I
explain why bit-level endianness in a processor doesn’t make sense. Feel free to
[skip](#bit-level-endianness). There is also a [summary](#summary).

## What Is Endianness

“Endianness” is a very bad jargon word for “the order in which a wide object is
fractured and placed in a narrow channel”. It is fancy computer-ese for “how do
we turn this sideways to make it fit”.

Endianness is a protocol for translating an abstract concept into electrical
units.

### What Is *Memory*

Memory in computers uses the “byte” as the atom – the fundamental, unsplittable,
base-case, unit – of memory. It so happens that for most of us, for the past
forty years and likely for the rest of our civilization, the byte will be eight
“bits” wide. The bit is the *actual* atom of electronic memory, but the bit can
hold very little information and eight of them can hold much more, so we make
basically all of our computers – and certainly all the computers about which you
care, if you’re reading this – use the eight-bit byte as the atom.

An eight-bit byte is this wide:

```text
[ _ _ _ _ _ _ _ _ ]
```

But this is still pretty narrow. These days, we typically work with much wider
objects. Consider a `u32`, a 32-bit value. It is this wide:

```text
[ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ]
```

Which is *far* too wide to put in a byte of memory.

Fortunately, our memory is a very long sequence of bytes:

```text
…[ _ _ _ _ _ _ _ _ ][ _ _ _ _ _ _ _ _ ][ _ _ _ _ _ _ _ _ ][ _ _ _ _ _ _ _ _ ]…
```

And there we have 32 bits, which can hold our 32-bit value. But, we have to
choose how to fracture the `u32` into four 8-byte chunks (this is an easy
choice: 32 evenly divides by 8, so we get four equal parts) and then how to put
each part in each byte.

### Ordering the Unordered

This is a surprisingly hard choice! You, the human being reading this, probably
parse your written text sequences from left to right, because this document is
written in the English language and English writes text from left to right.

English also writes its numbers in “big-endian” order: the most “significant”
digit – that is, the digit with the most effect on the absolute value of the
parsed number – on the left, and each digit to the right has successively less
influence on the overall number’s value.

This is great for human use, because we can stop caring about the digits at any
point in the string and just count the rest of them rather than *do math* and
still get a pretty good idea of what the number is.

Computers don’t need to approximate how big a number is. They always know how
big it is, because we tell them: it’s eight, or sixteen, or thirty-two, or
sixty-four, or one-hundred-twenty-eight, or so on as CPU vector registers
expand; computers tend to care about the *least* significant digits, which is
“little-endian” order.

The dilemma between favoring the computer and favoring the machine is why we
have the two most common endianness orderings: If you recite eight digits of
hexadecimal for a 32-bit number, then when you read a big-endian memory span
from the lower address to the higher address, the digits are in the same order
in the memory span as your recitation. And if that memory is instead
little-endian, you can answer questions like “is it non-zero” very quickly
because the one’s place is directly at the starting address, and not three bytes
higher in memory, and overall, we don’t tend to work with numbers that are even
multiples of 256 very often where the entire lowest byte is going to be zero.

### Translation To Memory

So let’s go back to our diagram. The 32-bit value is divided into four bytes,
and placed in bytes in one of two orders:

```text
[ _ Most Significant bit                           Least Significant bit _ ]
[ 0 0 0 0 0 0 0 0 ][ 1 1 1 1 1 1 1 1 ][ 2 2 2 2 2 2 2 2 ][ 3 3 3 3 3 3 3 3 ]
[ 3 3 3 3 3 3 3 3 ][ 2 2 2 2 2 2 2 2 ][ 1 1 1 1 1 1 1 1 ][ 0 0 0 0 0 0 0 0 ]
```

In the diagram, all three rows are written as the number would be on paper: the
most significant bit is on the left, and the least on the right. The numbers in
the bottom two rows indicate the byte address offsets of each chunk: the middle
row has its most significant byte at the start and its least at the end, and the
bottom row goes the other direction.

Notice that the numbers count to 3, not to 31. There is no difference in address
for each bit in the byte, and there is no meaningful way to describe “order” for
bytes because they have no numerical distinction, and there is no required
physical order to them.

We think of CPU registers as being a contiguous series of wires that count
monotonically from `0` on one edge to something on the other edge, but this is a
human representation. The electrical layout is not required to look anything
like this, as long as the abstract concept of neighborliness holds up.

CPUs do not impose a software-observable ordering on bit lines. Not in
registers, not in memory. Software cannot observe the ordering of bit lines. The
only access software has to bits in an element is through the shift instructions
`<<` and `>>` and the mask instructions `&`, `|`, and `^`, and these
instructions *all* present the abstract concept of an integer fundamental to
software.

## Bit-Level Endianness

Let’s go back to the oversimplification of “endianness is turning a wide value
sideways so it fits in a narrow channel”. We’ve established that there are two
primary directions to turn.

Processors don’t have any single-bit-wide channels for data. They have plenty
for control signals, but data paths are always some multiple of eight bits wide.
Once something is a sequence of bytes, those bytes don’t need to narrow anymore
to go across data paths in a processor.

I/O ports, however, *are* single-bit data paths. Bytes *do* need to do the same
narrowing and reordering process that wide integers do in order to cross those
wires. Each protocol defines a timing specification that translates a byte into
bits spread over clock ticks; some in “big-endian” where the MSbit transmits at
tick zero, the next bit down at tick one, and so on until the LSbit transmits at
the final tick; some in “little-endian” where the LSbit is at tick zero and the
MSbit at the final tick.

This timing specification defines both how a byte is sent one bit at a time
across a wire, and how a wire’s states over time are reëxpanded into a byte.
This is performed in hardware controllers that are built specifically to
implement the protocol, and communicate with the processor’s data paths. This
means that the hardware controllers are always correctly connected so all data
paths in the system agree on what the MSbit and LSbit of a data path are.

### So Why Does `bitvec` Have Cursors

Precise construction of I/O buffers. That is the only reason. The cursors are
meaningless for construction of buffers that stay in main memory.

### Is There a Native Bit Endianness for a Processor

No.

### How Do I Select a Cursor Trait

Use the one that matches your I/O source. If the first bit produced by a source
is the most significant bit in the element, use `BigEndian`; if the first bit
produced is the least significant, use `LittleEndian`.

### Why Not Name Them `Ltr` and `Rtl`

The directions “left” and “right” are already meaningless software, and
furthermore, they do not have a unique mapping to the concepts we actually want,
which are “first” and “last”.

## Summary

`<BigEndian, u8>` is a “good-enough” default. Switch to `<LittleEndian, u8>` if
you are building numbers rather than text. You should only use larger
fundamentals if you have a very specific need for I/O buffer layout, or if you
are using the crate for bit sets and do not care about buffer layout at all. The
wider fundamentals probably have better memory performance.

You should also type-alias your chosen layout pair:

```rust
type MySpecificSlice = BitSlice<LittleEndian, u32>;
type MySpecificVec = BitVec<LittleEndian, u32>;
```

and think about these details as little as possible.

[@glandium]: https://github.com/glandium
[issue #24]: https://github.com/myrrlyn/bitvec/issues/24
